-
Notifications
You must be signed in to change notification settings - Fork 453
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[GLUTEN-3547][CORE] [VL] Add native parquet writer in spark 3.4 #3690
Conversation
Run Gluten Clickhouse CI |
2264510
to
c1aceb5
Compare
Run Gluten Clickhouse CI |
c1aceb5
to
24f899e
Compare
Run Gluten Clickhouse CI |
1 similar comment
Run Gluten Clickhouse CI |
b8abf14
to
acb1897
Compare
Run Gluten Clickhouse CI |
acb1897
to
44fd20a
Compare
Run Gluten Clickhouse CI |
44fd20a
to
97c9b92
Compare
Run Gluten Clickhouse CI |
4 similar comments
Run Gluten Clickhouse CI |
Run Gluten Clickhouse CI |
Run Gluten Clickhouse CI |
Run Gluten Clickhouse CI |
0fa26d0
to
f211d53
Compare
Run Gluten Clickhouse CI |
f211d53
to
bf51603
Compare
Run Gluten Clickhouse CI |
3 similar comments
Run Gluten Clickhouse CI |
Run Gluten Clickhouse CI |
Run Gluten Clickhouse CI |
247dda2
to
91e1e8f
Compare
Run Gluten Clickhouse CI |
91e1e8f
to
aa51057
Compare
Run Gluten Clickhouse CI |
9e43eae
to
230950e
Compare
Run Gluten Clickhouse CI |
230950e
to
ced397b
Compare
Run Gluten Clickhouse CI |
1 similar comment
Run Gluten Clickhouse CI |
9890d72
to
243084c
Compare
Run Gluten Clickhouse CI |
...ends-velox/src/test/scala/org/apache/spark/sql/execution/VeloxParquetWriteForHiveSuite.scala
Outdated
Show resolved
Hide resolved
...ds-velox/src/test/scala/io/glutenproject/execution/VeloxParquetDataTypeValidationSuite.scala
Show resolved
Hide resolved
backends-velox/src/main/scala/org/apache/spark/sql/execution/VeloxColumnarWriteFilesExec.scala
Outdated
Show resolved
Hide resolved
243084c
to
00c4e56
Compare
Run Gluten Clickhouse CI |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
thank you @JkSelf
00c4e56
to
2c10863
Compare
Run Gluten Clickhouse CI |
@@ -267,7 +268,7 @@ case class FallbackEmptySchemaRelation() extends Rule[SparkPlan] { | |||
TransformHints.tagNotTransformable(p, "at least one of its children has empty output") | |||
p.children.foreach { | |||
child => | |||
if (child.output.isEmpty) { | |||
if (child.output.isEmpty && !child.isInstanceOf[WriteFilesExec]) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What's the reason of making the change? Thanks.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@zhztheplayer The output of WriteFilesExec
is empty. So it will fallback if no this limitation.
val Array(major, minor, _) = SparkShimLoader.getSparkShims.getShimDescriptor.toString.split('.') | ||
if (major.toInt > 3 || (major.toInt == 3 && (minor.toInt >= 4))) { | ||
List() | ||
} else { | ||
List(spark => NativeWritePostRule(spark)) | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can we somehow do this in shim? Would that require for lot of effort?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@zhztheplayer Good catch. I will add this in shim in the following PRs. Thanks.
@JkSelf Just want to clarify the recommend way to use native parquet writer was to use |
@WangGuangxin Yes. You are right. |
What changes were proposed in this pull request?
Since the introduction of the
WriteFilesExec
operator in Spark 3.4 to facilitate write operations (SPARK-41708), we can now utilize this operator to enable native Parquet write. This PR introduces theWriteFilesExecTransformer
to delegate the process to the VeloxTableWriteNode
, andColumnarWriteFilesExec
is added to implement thedoExecuteWrite()
method by convertingRDD[ColumnarBatch]
toRDD[WriterCommitMessage]
。Due to the differences between vanilla Spark and Velox, this PR has dependencies on the following three upstream Velox PRs:
facebookincubator/velox#8089
facebookincubator/velox#8090
facebookincubator/velox#8091
Limitations and failed unit test recorded here.
How was this patch tested?
Enable spark.gluten.sql.native.writer.enabled config in spark 3.4 unit test.